Methods of Data Analysis Learning probability distributions

ثبت نشده
چکیده

One of the key problems in non-parametric data analysis is to infer good models of probability distributions, assuming we are given as data a finite sample from that distribution. This problem is ill-posed for continuous distributions, even when they are low-dimensional: with finite data, there is no way to distinguish between (or exclude) distributions that are not regularized a priori, for example, by assuming that they are smooth in some sense. The question then becomes about how to formulate the problem of choosing the “best” distribution among the distributions that are “sufficiently smooth” mathematically; intuitively, more data should allow us to consider less smooth distributions. One way of constructing such continuous distribution estimates is by kernel density estimation (KDE), which literally involves smoothing (or convolving) the data with a particular kernel function. Assuming a metric on the state space, the kernel specifies how far away from individual data points the corresponding contributions to estimated distribution extend. Another possibility is to describe the data with a parametric family of distributions that is sufficiently rich so that, in the large sample limit, it can describe an arbitrary smooth distribution. A well-known example from this class are Mixtures of Gaussians. Both of these approaches are reasonably simple to implement, both can be seen as instances of probabilistic (maximum likelihood or Bayesian) inference for their parameters, and both belong to a standard set of unsupervised learning tools for data exploration / modeling. Yet another versatile framework for modeling both continuous and discrete distributions of potentially high dimensionality given a finite sample is the maximum entropy (ME) approach. Here, we are looking for the most random distribution (= maximum entropy) that exactly reproduces a chosen set of statistics which can reliably be estimated from data. The assumption of maximum entropy is a formal version of Occam’s razor: one chooses distributions of a particular form that contain a minimal amount of structure that is nevertheless sufficient to explain selected aspects of observations (constraints). Seen another way, the maximum entropy assumption is just another form of regularization, the intuition being that the most regular smooth distributions are uniform distributions which maximize the entropy by definition. What makes maximum entropy models special are certain uniqueness theorems and links to information-theoretic quantities. Going beyond these standard tools, one can consider unsupervised deep learning models, restricted Boltzmann machines, field-theoretic methods for learning probability distributions and Gaussian Processes (in a supervised setting) that we will consider in the next lecture.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Low flow frequency analysis by L-moments method (Case study: Iranian Central Plateau River Basin)

Knowledge about low flow statistics is essential for effective water resource planning and management in ungauged orpoorly gauged catchment areas, especially in arid and semi-arid regions such as Iran. We employed a data set of 20 riverflow time-series from the Iranian Central Plateau River Basin, Iran to evaluate the low-flow series using several frequencyanalysis methods and compared the resu...

متن کامل

Flood Flow Frequency Model Selection Using L-moment Method in Arid and Semi Arid Regions of Iran

Statistical frequency analysis is the most common procedure for the analysis of flood data at a gauged location thatin first step it is needed to select a model to represent the population. Among them, the central moment has been themost common and widely used, and with the using of computers, the application of the maximum likelihood hasincreased. This research was carried out in order to reco...

متن کامل

Frequency Analysis of Maximum Daily Rainfall in various Climates of Iran

    In this research in order to frequency analysis of maximum daily rainfall in various climates of Iran the data of 40 synoptic rain gauges collected in 40 years period i.e., 1973 to 2012 were used. These stations are located in various climates of Iran according to De Martonne climatic classification. At first, input of data to HYFA package was performed. The mentioned package includes seven...

متن کامل

Probability Distribution Fitting to Maternal Mortality in Nigeria.

The consequences of Maternal Mortality (MM) cannot be overemphasized. It inhibits population growth resulting into loss of lives among others. This work tends to obtain the maternal mortality rates (MMR) in Nigeria, identify some fitted distributions to MMR and determine which of the distributions best fits the data. A comprehensive Exploratory Data Analysis (EDA) was carried on MM and the MMRs...

متن کامل

Image alignment via kernelized feature learning

Machine learning is an application of artificial intelligence that is able to automatically learn and improve from experience without being explicitly programmed. The primary assumption for most of the machine learning algorithms is that the training set (source domain) and the test set (target domain) follow from the same probability distribution. However, in most of the real-world application...

متن کامل

A Statistical Analysis of the Aircraft Landing Process

Managing operations of the aircraft approach process and analyzing runway landing capacity, utilization and related risks require detailed insight into the stochastic characteristics of the process. These characteristics can be represented by probability distributions. The focus of this study is analyzing landings on a runway operating independent of other runways making it as a single runway. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016